Part 1. Linear Models: Understanding relationships between decisions

Chapter 1. Considering the world algebraically

This is a book about building models. To that end, it is different from other books about decision modeling. Its focus is almost entirely on the construction and use of models instead of the mathematical business of solving them. While the latter endeavor is a rigid and difficult field of scientific inquiry, the former is more of an art form, with sometimes less well-defined ideas of right and wrong. It is thus, at least in our belief, a bit more fun.

But perhaps we are already getting ahead of ourselves. Let us step back and talk about models in more general terms. As far as we are concerned, a model is an abstraction of reality that we use to understand and make decisions about the world. Our world is an increasingly complex place, and models are often the only tools we have to comprehend it. If done properly, a fairly simple model will allow one to take reasonable actions in the face of great uncertainly or incomprehensible complexity. Done poorly, even the most complete model will lead one far astray, a fact there are always fresh examples to support.

There are many different kinds of models. An architect's rendering models proposed buildings and their surrounding phyiscal world. It attempts to show how a structure might fit into an existing space, among other aims. A statistical model may try to show how one sample differs from another, for instance how those who take a drug of iterest differ in health from those who do not. Or it may predict the likelihood of something occuring, such as rain on a given day of the week.

We shall focus on certain types of models which can be described algebraically. That is, we can fully describe them using variables, equations and inequalities, and an objective. While that may sound rather specific, we'll find that it is actually quite general and that a great number of problems can be solved using this form.

One of the advantages of working within general purpose modeling frameworks is that it forces us to be precise. This allows us to use a rich set of software tools based on well developed, advanced branches of mathematics. It also allows us to focus on the structure of our problems rather than their implementation. We'll find that a solid understanding of such techniques fits well with other types of analysis and greatly broadens our abilities as data scientists.

1.1. Representing decisions as variables

Before we design and build models to represent the systems around us, we must first discuss the components with which we plan to construct those models. This is akin to opening up and examining one's toolbox before beginning a carpentry project. Let's take a minute to make sure our hammer and screwdriver are in working order first.

We wish to speak in the language of algebraic models. To that end we will learn its various parts, or components, and their intended functions. These parts and rules define a simple language and almost trivial grammar, but they can be combined in limitless, sometimes unexpected ways.

As with any field of mathematics, the first and most fundamental component of any mathematical model is the variable. Many texts call this a "decision variable." In certain ways that name is appropriate. A decision variable represents the realm of possibilities for a single action we can take in some system. Variables can represent anything to which we assign a quantity: shares of a stock to purchase, teapoons of salt to use in a recipe, or whether or not to travel along a particular road. Or they might represent something more abstract, such as the amount of error between an observed and fitted value in a statistical inference.

Most of the time, we combine many different variables together into a single problem. For instance, say we are writing a schedule for our time next Monday. We want to represent the hours we plan to spend on various activities. To start with, we choose three broad categories: sleep, work, and leisure. We represent each of them with the letter $x$ and a subscript denoting which activity that particular $x$ refers to.

$$x_s = \text{hours spent sleeping}$$$$x_w = \text{hours spent working}$$$$x_l = \text{hours spent lollygagging}$$

Thus we have three decision variables, each in units of hours. If $x_s = 8$, then we plan to spend eight hours sleeping. The activities these variables represent belong to the same set. That is, they are all things we can spend time doing.

Since $s$, $w$, and $l$ are the only activities in our set, we may opt to simplify our notation. Instead of individual variables, we can also refer to the vector $x$, which contains $x_s$, $x_w$, and $x_l$. The notation below shows $x$ as a column vector with three rows.

$$x = \left( \begin{array}{c} x_s \\ x_w \\ x_l \end{array} \right)$$

This is particularly convenient when we need to represent an arbitrary number of decisions. For instance, we may build a model using the three activities given above, and then extend it to include any activity our life coach enters into a database. Our model will not know a priori that our coach wants us to spend time meditating, crafting, and sweating through hot yoga classes. But we can still capture the structure of our model without knowing what the individual decisions will be. We simply state that there will be some positive integer $n$ activities to make decisions about, and that our $x$ vector contains them all.

$$x = \left( \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right)$$

Of course, this notation becomes a bit more tidy if we use the transpose of $x$. The following representation is equivalent.

$$x = \left( \begin{array}{c c c c} x_1 & x_2 & \dots & x_n \end{array} \right)^\intercal$$

The tranpose of a vector $x$ is notated $x^\intercal$. Transposing a vector changes it from being a column vector to a row vector or vice versa. In this book we be dealing exclusively with column vectors, unless otherwise stated. We will often show them as transposed row vectors for the simple purpose of saving space.

1.2. Setting limits on activities

Perhaps the first question we should ask ourselves about any decision we can make is: what values are reasonable for that decision, and what values are unreasonable? Often, decisions variables inhabit a range of possible values. We would like to constrain our variables so that they can only take on values within their assigned ranges. We define these ranges using upper bounds and lower bounds.

In the case of our variables $x_s$, $x_w$, and $x_l$, a couple obvious bounds come to mind. Given that these are defined as the number of hours we intend to spend sleeping, working, and at leisure for one day, it is obvious that none of these can exceed 24. This gives us an easy upper bound on the decision variables.

$$x_s, x_w, x_l \le 24$$

Perhaps less obvious is their lower bound. For this, we consider that there isn't much meaning in setting any of our variables to a value less than zero. Saying we're going to sleep for -4 hours would at best get us a few funny looks. Since it isn't something we know how to accomplish, we'll go ahead and require that all our variables be nonnegative. Now we have both upper and lower bounds.

$$0 \le x_s, x_w, x_l \le 24$$

Or we can overload $\le$ to mean componentwise inequality, and express the same thing more succinctly.

$$0 \le x \le 24$$

At first glance, nonnegativity may appear an odd requirement for decision variables. But we'll see that many models don't much make sense without it. Often, we are modeling actions that can be taken, and taking action maps quite naturally to the realm of nonnegative numbers. Doing something for a negative amount of time has as little meaning as producing a negative number of widgets or negatively sending a train along some route. These are actions that cannot reasonably be taken.

We will see later that this is not always the case. However, such models tend to involve tasks such as data fitting and do not directly relate to decision making.

Returning to our schedule variables, we may decide that we are unwilling to sleep less than seven hours a night. Further, our manager tells us we'll be fired if we work less than ten hours. While it is entirely possible that we may sleep for only six hours, or work for nine, we are rejecting those possibilities up front. Any set of values for our decision variables where $x_s < 7$ or $x_w < 10$ will be deemed infeasible and not considered. The same can be said when any of the variables are less than zero or greater than 24.

$$7 \le x_s \le 24$$$$10 \le x_w \le 24$$$$0 \le x_l \le 24$$

These variable bounds can also be described using vector notation, as shown below.

$$\left( \begin{array}{c c c} 7 & 10 & 0 \end{array} \right)^\intercal \le x \le 24$$

It is important to note that we will consider only inequalities of the form $\le$ and $\ge$, not $<$ and $>$. While this is mathematically convenient, it more importantly removes the difficulty of dealing with arbitrary closeness. For instance, if we bound $x_s$ from below by requiring $x_s > 7$, then for any value of $x_s$, there is always a value closer its lower bound. The bound has more meaning to us if we use the form $x_s \ge 7$.

1.3. Modeling relationships of activities

The requirements of feasibility extend beyond merely satisfying our variables' bounds. Any variables that consume the same resources interact with each other and can combine to form infeasibility. In our Monday scheduling example, we have one resource to allocate: time. All three of our variables consume time, and to fully describe the set of possible schedules we must include this interaction in our model.

To illustrate, say we arbitrarily choose to spend ten hours sleeping, ten hours working, and the remaining four hours in leisurely pursuits. In this schedule, our variables take the values $x_s = 10$, $x_w = 10$, and $x_l = 8$. We try this schedule and decide that it simply doesn't give us enough leisure time, and that we'd rather set $x_l$ to a higher value.

We now face a dilemma. We cannot simply increase the amount of time spent on leisure, as that would mean allocating more than 24 hours of activity. Instead, we must choose between taking time away from one of the two other activities: sleep or work. We are bound by this resource of time, and changing the value of a variable may impact other variables that use time. Writing this algebraically adds the following contraint to our model.

$$x_s + x_w + x_l = 24$$

If we are have an unknown or large number of activities, it may be more convenient to express this contraint as a summation.

$$\sum_{i \in \{s, w, l\}} x_i = 24$$

We may also encounter use of the dot product vector notation to describe constraints of this form.

$$\textbf{1}^\intercal x = 24$$

$\textbf{1}$ is a $1 \times n$ column vector of ones. $\textbf{1}^\intercal x$ is the dot product of this vector and $x$, which amounts to adding the components of $x$. All three notations given above are equivalent.

1.4. Choosing between potential decisions

We will call any assignment of values to our $x$ vector that satisfies both our constraints and variable bounds a feasible solution, or just a solution for short. A bit of experimentation will show that there are any number of potential solutions to this problem. We might set $x = \left( \begin{array}{c c c} 8 & 10 & 6 \end{array} \right)^\intercal$ or perhaps $x = \left( \begin{array}{c c c} 7 & 10.5 & 6.5 \end{array} \right)^\intercal$. This leads us to the question of which solution is best. That is, given all feasible solutions, which one do we believe is optimal?

To make any sort of rational choice between our myriad options, we must quantify the value of our activities. Let's say that we have a rough idea how happy these activities make us in relation to each other, and that our goal is maximize our happiness. We choose the values 1.5, 1, and 2 for the happiness an hour of sleep, work, and leisure gives us. Note that we aren't putting any units on these numbers, and that's OK for now. In future examples we'll use units such as currency or time, but that isn't necessary all the time.

We can use these numbers to create an objective function. In this case, our objective is to maximize our happiness over the set of all feasible schedules. That is, if we were to evaluate every possible schedule, we would like to make note of the one that makes us happiest. We write our objective as a simple function preceded by $\max$ to indicate we are maximizing the objective.

$$\max z = 1.5 x_s + x_w + 2 x_l$$

By convention, we assign the value of our objective to a new variable called $z$.

Maximizing is not our only option. We could also minimize some objective, such as the cost of a schedule or the likelihood of being fired. Minimizing the negative of a maximization yields the same value, and vice versa. That is, the following is equivalent to our objective above.

$$\min w = -1.5 x_s - x_w - 2 x_l$$

Again, the choice of $w$ as our objective variable is by convention. Deciding between writing an objective function as a maximization or minimization is mostly a matter of what seems natural for a particular model. We often want to maximize results we see as positive, such as profits, and minimize things that are negative, such as costs.

References


In [ ]: